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Wide-Area 
C  omputing: 

R  esource  Sharing 
on  a  L  arge  Scale 

Computing  over  wide-area  networks  has  been  largely  ad  hoc,  but  as  needs 
increase,  piecemeal  solutions  no  longer  make  sense.  Legion,  a  network-level 
operating  system,  was  designed  from  scratch  to  target  wide-area  computing 
demands. 
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Consider  almost  any  computing  resource 
today— whether  hardware,  software,  or 
data— and  it  will  invariably  be  net¬ 
worked.  Networking,  especially  wide- 
area  networking,  has  created  dramatic 
new  possibilities  for  resource  sharing.  Cooperating 
contractors  want  selected  access  to  each  other's  enter¬ 
prise  systems.  Researchers  in  geographically  distant 
universities  need  to  pool  and  analyzedata  from  multi¬ 
site  experiments.  Legacy  codes  on  different  comput¬ 
ing  platforms  must  exchange  information  to  support 
data  mining  and  other  integrated  applications. 

These  new  possibilities  depend  on  the  ability  to 
manage  shared  resources.  But  the  sheer  complexity  of 
networked  environments  can  turn  this  management 
problem  into  a  nightmare.  How  do  you  share  and 
manage  resources  yet  maintain  the  autonomy  of  mul¬ 
tiple  administrative  domains,  hide  the  differences 
between  incompatible  computer  architectures,  com¬ 
municate  consistently  as  machines  and  network  con¬ 
nections  are  lost,  and  respect  overlapping  security 
policies?  The  usual  approach  to  these  problems  has 
been  to  deal  with  each  situation  individually.  Piecemeal 
solutions  are  cobbled  together  from  scripts,  sockets, 
and  various  networking  tools.  If  all  goes  well,  a  sophis¬ 
ticated  programmer  can  build  and  maintain  theappli- 
cation,  but  even  then  the  implementation  tends  to  be 
brittle  and  limited. 

Resource  management  is  traditionally  an  operating 
system  problem,  but  large-scale  collections  of 
resources  transcend  classic  operating  system  bound¬ 
aries.  W  hat  is  needed  is  a  wide-area  operating  system 
that  can  abstract  over  a  complex  set  of  resources  and 


provide  a  high-level  way  to  share  and  manage  them 
over  the  network.  To  be  effective,  such  a  system  must 
address  the  challenges  posed  by  real  end-user  appli¬ 
cations  (see  the  sidebar  "Challenges  for  a  Wide-Area 
Operating  System").  Scalability,  security,  and  fault  tol¬ 
erance  are  just  a  few  of  the  characteristics  a  viable  solu¬ 
tion  must  have. 

Five  years  ago,  we  set  out  to  design  and  build  a 
wide-area  operating  system  that  would  encompass  all 
these  challenges,  allowing  multiple  organizations  with 
diverse  platformsto  shareand  combinetheir  resources. 
Our  system,  Legion  (http://legion.virginia.edu),  is  now 
operational  on  hundreds  of  hosts  across  nine  U  S  sites, 
including  the  two  N  SF  supercomputer  centers  (San 
Diego  Supercomputer  Center  and  N  ational  Center  for 
Supercomputing  Applications),  two  DoD  supercom¬ 
puter  centers  (N  aval  0  ceanographic  0  ffice and  A  rmy 
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Figure  1.  How  the  Legion  wide-area  operating  system  works.  Legion  Host  and  Vault  proxy  objects  provide  a  uniform  interface  to  heterogeneous 
collections  of  processing  and  storage  resources.  These  resource  proxy  objects  are  managed  by  Class  Manager  objects.  Class  Managers  detect  and  report 
resource  faults,  for  example.  Scheduler  objects  gather  information  about  the  system  state;  Class  Managers  use  these  Schedulers  to  select  and  access 
required  resources.  All  these  system  components—  resource  objects,  managers,  schedulers,  and  application  objects— are  addressable  in  a  single, 
systemwide,  uniform  object  space. 


Research  Laboratory),  NASA's  Aeronautical  Research 
Center,  and  several  universities.  Users  have  ported  a 
rangeof  scientific  applicationsto  Legion  in  areas  such 
as  molecular  biology,  materials  science,  ocean  and 
atmospheric  science,  electrical  engineering,  and  com¬ 
puter  science. 

Legion  is  essentially  a  conduit  between  the  end  user 
and  widely  distributed  collections  of  resources.  Like  a 
traditional  operating  system,  it  supports  services  such 
as  resource  management  and  a  distributed  filesystem. 
This  operating  system-style  interface  leverages  appli¬ 
cation  programmer  experienceand  simplifies  porting 
legacy  applicationsto  the  Legion  platform.  H  owever, 
unliketypical  operating  systems,  Legion  is  layered  on 
top  of  existing  software  services.  It  uses  the  existing 
operating  systems,  resource  management  tools,  and 
security  mechanisms  at  host  sites  to  implement  higher 
level  system-wide  services.  Becauseof  this  middleware 


approach,  Legion  is  able  to  reuse  local  services,  and 
sites  do  not  have  to  change  familiar  local  software 
interfaces. 

Legion  is  a  component-based  system:  Distributed 
application  components  are  represented  as  indepen¬ 
dent,  active  objects.  This  approach  greatly  simplifies 
the  development  of  distributed  applicationsand  tools. 
Instead  of  facing  the  complexity  of  a  wide  range  of 
distributed  resources  and  service  interfaces,  the  pro¬ 
grammer  works  with  thesimple,  uniform  abstraction 
of  distributed  objects.  Legion  also  supports  a  high  level 
of  site  autonomy.  Local  sites  can  select  and  configure 
the  components  that  represent  their  resources  and  ser¬ 
vices  in  any  way  they  see  fit,  retaining  complete  con¬ 
trol  over  local  access  control  policies,  resource  quota 
mechanisms,  and  so  on.  Legion's  inherent  flexibility 
is  its  greatest  strength,  and  its  most  important  defin¬ 
ing  characteristic. 
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HOW  LEGION  WORKS 

With  components  that  must  interoperate  in  wide- 
area  heterogeneous  environments,  Legion's  funda¬ 
mental  object  model  resembles  the  Common  Object 
R  equest  Broker  A  rchitecture  (C  0  R  BA ).  Programmers 
describe  object  interfaces  in  an  interface  description 
language  (IDL)  and  then  compile  and  link  them  to 
implementations  in  programming  languages  such  as 
C++,  Java,  or  Fortran.  All  system  elements  are  objects 
and  can  communicatewith  oneanother  regardlessof 
location,  heterogeneity,  or  implementation  details. 
Within  this  object-based  framework,  Legion  provides 
the  services  of  a  distributed  operating  system.  Figure 
1  outlines  the  structure  of  a  sample  Legion  system. 

The  easiest  way  to  understand  how  Legion  works 
isto  consider  how  it  handles  classic  operating  system 
tasks,  which  we  consider  in  turn. 

Representing  and  managing  resources 

As  Figure  1  shows,  local  sites  use  FI  ost  and  Vault 
objects  to  represent  processors  and  storage,  respec¬ 
tively.  Using  objects  to  represent  resources  has  two 
primary  benefits: 

•  Objects  define  a  simple,  consistent  interface  to 
Legion's  resources.  FI  osts  provide  the  uniform 
interface  for  creating  objects  (tasks);  Vaults  pro- 


videthe  uniform  interface  for  allocating  persistent 
storage.  These  interfaces  provide  a  consistent  view 
of  system  resources,  even  though  local  resource 
interfaces  differ  significantly  in  practice. 

•  The  resource  object  model  provides  a  tremendous 
degree  of  site  autonomy.  Applications  (acting  as 
resource  clients)  use  the  generic  object  interfaces 
for  the  resources  they  require.  Resource  providers 
are  free  to  employ  any  desired  implementation  of 
the  resource  objects. 

The  second  benefit  is  particularly  significant.  For 
example,  if  system  administrators  at  a  site  want  to 
enforce  a  specialized  access  control  policy  for  their 
local  hosts,  they  can  extend  or  replace  the  basic  FI  ost 
implementation  to  enforce  that  policy.  Similarly,  some 
of  the  hosts  in  Legion  systems  may  require  access 
through  a  local  queue  management  system  such  as 
Genias  Software's  Codine  (http://www.genias.de)  or 
IBM's  LoadLeveler  (http://www.rs6000.ibm.com/ 
software/sp_products/loadlev.html).  In  these  cases, 
resource  providers  simply  use  extended,  queue-aware 
FI  ost  objects.  Likewise,  if  a  resource  provider  makes 
storage  in  a  local  filesystem  available  to  Legion,  yet 
wants  to  continueusing  local  Unix-based  accounting 
and  quota  tools,  he  can  use  a  Vault  object  implemen¬ 
tation  that  allocates  storage  under  the  appropriate 


Challenges  for  a  Wide-Area 
Operating  System 

At  Boeing  Company,  designers  use  sim¬ 
ulation  to  make  ever  more  complex  air¬ 
frames  at  a  manageable  cost.  Pratt  &  Whit¬ 
ney,  which  designsand  suppliesjet  engines 
to  Boeing,  also  relies  heavily  on  simulation. 
When  Boeing's  engineers  simulate  an  air¬ 
frame's  behavior,  they  need  to  know  how 
the  engine  coupled  to  that  airframe  will 
perform  under  various  conditions.  H  ow- 
ever,  Pratt  &  Whitney  cannot  release  its 
proprietary  engine  simulations  because  of 
the  significant  intellectual  property  they 
encode.  In  an  unwieldy  information  ex¬ 
change  process,  Boei  ng  engi  neers  must  ask 
Pratt  &  Whitney  engi  neers  to  run  their  sim¬ 
ulation  at  specified  data  points  and  then 
send  them  results  by  tape.  Boeing  engineers 
then  combine  the  information  with  their 
own  simulation  data  and  modify  it  accord¬ 
ingly.  The  process  iterates. 

In  a  completely  different  domain,  H  ar- 
vard  M  edical  School  researches  the  causes 
and  symptoms  of  multiple  sclerosis.  They 
need  to  getM  Rl  scansfrom  multiple  part¬ 


ner  institutions  and  to  make  a  database  of 
image-processed  results  availableto  the 
partners.  As  a  first  step,  they  want  a  tool 
that  can  automatically  identify  pertinent 
M  Rl  scans  at  partner  hospitals,  securely 
move  those  scans  over  the  Internet  to  H  ar- 
vard,  and  then  process  them.  The  partners 
will  providevery  little administrativesup- 
port  for  thetool. 

In  another  medical  setting,  seven  com¬ 
peting  Dayton,  Ohio,  hospitals  are  work¬ 
ing  together  to  reduce  costs.  By  sharing 
patient  records  and  making  them  elec¬ 
tronically  availableto  emergency  room 
physicians,  they  avoid  expensiveand  time- 
consuming  tests  and  can  provide  better 
care  more  quickly.  Each  hospital  has  its 
own  legacy  medical  records  system,  IS  per¬ 
sonnel,  and  procedures  that  must  some¬ 
how  be  merged.  H  owever,  each  also  has 
databases  and  programs  that  cannot  be 
shared. 

Finally,  climate  modeling  groups  at  San 
Diego  Supercomputer  Center,  UCLA,  and 
Lawrence  Berkeley  Laboratory  want  to 
couple  a  global  atmospheric  circulation 


model  with  a  regional,  mesoscaleweather 
model.  The  coupled  models  would  feed 
data  to  each  other,  creating  more  accurate 
and  detailed  combined  results.  The  exist¬ 
ing  regional  model  runs  only  on  a  Cray 
T90,  while  the  global  model  runs  on  a 
Cray  T3E  and  is  being  migrated  to  the 
IBM  SP.  The  applications  need  a  way  to 
coordinate  and  exchange  data  with  one 
another  at  runtime,  be  scheduled  to  run 
simultaneously  on  separate  supercomput¬ 
ers,  and  be  easily  controlled  by  a  researcher 
at  a  single  workstation. 

These  applications  characterize  the 
spirit  of  wide-area  computing.  Some  of  the 
requirements  are  unique,  while  others 
overlap.  Theapplicationsalso  illustrate  the 
following  significant  challenges,  from 
managing  complexity  to  implementing 
flexible,  robust  security. 

Provide  a  high-level  programming  model 

Complexity  is  the  programmer's  nemesis: 
A  large-scale  system  can  comprise  several  dif¬ 
ferent  architectures,  tens  of  sites,  hundreds 
of  applications,  and  potentially  thousands 
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local  Unix  user-id  for  each  Legion  client. 

Legion  provides  configurable  default  implementa¬ 
tions  of  the  basic  resource  objects,  so  resource 
providers  generally  need  not  write  any  code  to  make 
their  resources  available.  However,  through  object 
extension  and  replacement,  Legion  is  flexible  enough 
to  support  new  local  resource  interfaces  and  policies 
as  they  arise. 

Managing  tasks  and  objects 

Traditional  operating  systems  must  provide  inter¬ 
faces  for  starting  new  tasks  and  controlling  their  exe¬ 
cution  (suspend,  resume,  terminate,  and  so  on).  In 
Legion,  the  notion  of  a  task  or  process  corresponds 
closely  to  theLegion  object:  Objectsare theactivecom- 
putational  entities  within  the  system.  Legion  encapsu¬ 
lates  object  management  functions  in  the  Class 
M  anager  object  type.  ClassM  anagers  have  three  main 
functions: 

•  They  support  a  consistent  interface  for  object 
management.  The  Class  M  anager  interface  in¬ 
cludes  a  natural  set  of  object  (or  task)  manage¬ 
ment  operations,  such  as  methods  to  create  and 
destroy  objects.  Each  C  lass  M  anager  is  responsi¬ 
ble  for  a  set  of  instances,  which  clients  control 
through  the  Class  Manager  interface.  Class 
M  anagers  act  as  policymakersfor  their  instances. 


For  example,  an  object's  Class  M  anager  deter¬ 
mines  which  resources  the  object  may  use,  and 
might  enforcea  policy  that  lets  instances  run  only 
on  a  known  set  of  trusted  hosts. 

•  They  actively  monitor  their  instances.  Class 
M  anagers  query  the  status  of  their  instances  to 
detect  failures  and  coordinate  failure  response  (see 
Figure  1).  In  this  role,  ClassM  anagers  act  as  a  dis¬ 
tributed,  agent-based  fault-detection  and  response 
mechanism  within  Legion. 

•  They  support  persistence.  All  Legion  objects  can 
be  persistent,  existing  arbitrarily  beyond  the  life 
of  their  creating  program.  When  an  object  is  not 
in  use,  it  can  be  deactivated:  Its  state  is  saved  to 
stable  storage,  and  its  containing  process  is  de¬ 
allocated  (to  conserve  resources).  This  notion  of 
o bj ect  activation/deactivation  is  similar  to  tradi¬ 
tional  operating  systems  temporarily  swapping 
out  a  job.  To  make  object  deactivation  transpar¬ 
ent  to  cl  ients,  the  C  lass  M  anager  acts  as  an  auto¬ 
matic  reactivation  agent.  If  a  client  attempts  to 
invoke  a  method  on  an  inactive  object,  the  Class 
M  anager  automatically  reactivates  it.  Reactiva¬ 
tion  isthus  as  transparent  as  resuming  swapped- 
out  processes  in  traditional  systems. 

Decomposing  object  management  responsibilities 
into  an  arbitrary  number  of  C  lass  M  anagers  provides 


of  hosts.  Reducing  and  managing  complex¬ 
ity  is  therefore  critical.  The  object-oriented 
paradigm  and  object-based  programming 
provide  programmers  and  application 
designers  with  encapsulation  features  and 
tools  for  abstraction  that  reduce  and  com¬ 
partmentalize  complexity.  Wefirmly  believe 
that  object-based  techniques  are  key  to  con¬ 
structing  robust,  wide-area  systems. 

These  techniques  are  not  enough,  how¬ 
ever.  Composable,  high-level  services  must 
replace  low-level  interfaces  such  as  rsh  and 
sockets  in  the  programmer's  toolbox. 
Without  such  services,  the  complexity  of 
distributed  programming  goes  up  dramat¬ 
ically,  increasing  both  theskill  set  required 
to  build  applications  and  the  fragility  of 
the  resulting  software. 

Offer  a  single  system  image 

To  combat  the  daunting  number  of  dis¬ 
tinct  hosts  and  file  systems,  programmers 
need  a  single  system  image— the  abstrac¬ 
tion  of  a  single  machine  and  associated 
storage.  For  some,  a  “single system  image” 
means  a  single  shared  address  space;  for 


others,  theability  to  run  psand  geta  list  of 
all  processes  throughout  the  system.  We 
definea  singlesystem  image asa  universal 
name  space  and  management  infrastruc¬ 
ture  for  all  objects  of  interest  to  thesystem 
and  its  users:  files,  processes,  processors 
(hosts),  storage,  users,  services,  and  so  on. 
Thenames should  belocation  independent 
(not  contain  any  location  information)  and 
should  beusablefrom  anywhere  in  thesys¬ 
tem.  Further,  as  programmers  use  resources 
to  createtheir  own  objects,  they  should  not 
deforced  to  explicitly  place  objects  on  a 
particular  host  or  disk—  thesystem  should 
handlethat.  T  hus,  the  programmer  or  user 
can  specify  or  know  an  object's  location 
when  necessary,  but  if  this  information  is 
not  relevant  to  his  task,  he  can  ignore  it. 

Accommodate  diverse 
administrative  policies 

Most  wide-area  computing  requires 
joining  multipleorganizationsand  admin¬ 
istrative  domains.  To  make  this  bridging 
easy,  the  system  must  accommodate  a 
diverse  set  of  local-use  policies,  access  con¬ 


trol  policies,  and  computational  cultures. 
For  example,  a  site  might  insist  that  users 
authenticate  via  Kerberos  before  using  its 
resources,  or  that  users  sign  an  "accept¬ 
able  use  policy"  statement,  or  that  each 
day  from  1:00  p.m.  to  6:00  p.m.  no  appli¬ 
cations  can  be  run  that  consume  more  than 
five  CPU  minutes.  Extensibility  and  flexi¬ 
bility  thus  become  essential— users  must 
beableto  readily  extend  and  configurethe 
system  to  satisfy  local  requirements. 

Manage  heterogeneous  resources 

Resource  heterogeneity  is  a  natural  part 
of  the  distributed  environment.  Types  of 
heterogeneity  include  processor,  data  for¬ 
mat,  configuration  (how  much  memory 
and  disk?  which  libraries  are  available  on 
a  host?),  and  operating  system.  If  hetero¬ 
geneity  is  not  managed,  individual  users 
and  programmers  must  deal  with  thecom- 
plexity  induced  by  all  the  possible  permu¬ 
tations  of  hardware,  operating  system,  and 
resources,  a  task  that  can  rapidly  over¬ 
whelm  even  the  best  programmers. 
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a  natural  distribution  of  the  system's  object  manage¬ 
ment  activities.  Also,  because  Class  M  anagers  are 
extensible,  replaceable  objects,  it  is  easy  to  customize 
the  system's  object  management  mechanisms.  For 
example,  to  enable  certain  forms  of  failure  resilience, 
some  Legion  classes  use  replication.  The  specialized 
Class  M  anagers  used  for  these  object  classes  create 
and  manage  replicas  transparently  to  clients. 

Naming 

N  aming  is  a  basic  interface  issue  in  operating  sys¬ 
tem  design.  For  example,  operating  systems  typically 
define  a  name  space  for  identifying  processes  (such  as 
Unix  PIDs),  as  well  as  a  file  system  name  space  for 
identifying  files  and  directories.  Legion  represents  all 
entities— files,  processors,  storage  devices,  networks, 
users,  and  so  on— as  objects.  These  objects  are  iden¬ 
tified  by  a  three-level  naming  scheme.  At  the  lowest 
level,  each  object  isassigned  an  object  address—  a  list 
of  network  addresses  for  the  object.  An  object  address 
might  contain  an  IP  address  and  port  number,  for 
example.  Because  Legion  objects  can  migrate,  object 
addresses  change  over  time.  Legion  thus  defines  an 
intermediate  layer  of  location-independent  names 
called  Legion  object  identifiers.  LO IDs  are  globally 
unique  identifiers  that  are  assigned  to  objects  when 
they  are  created.  Because  they  are  binary,  system- 
assigned  names,  they  are  not  convenient  for  users.  To 


address  this  deficiency,  Legion  supportsa  hierarchical 
directory  service,  context  space,  which  lets  users  assign 
arbitrary  Unix-like  string  paths  to  objects. 

The  Legion  naming  mechanism  reduces  the  com¬ 
plexity  of  designing  distributed  applications  because 
it  provides  a  single  global  name  space  for  all  system 
entities.  A  typical  distributed  environment  supports 
separate  name  spaces  for  files,  hosts,  and  processes; 
Legion,  in  contrast,  supports  the  same  global  name 
space  for  all  these  as  well  as  additional  entities.  The 
interface  to  this  global  name  space  is  very  easy  to  use; 
at  the  highest  level  (context  space)  the  user  manipu¬ 
lates  names  in  the  familiar  form  of  U  nix-style  paths. 
Furthermore,  Legion's  scalable,  replicated  binding  ser¬ 
vices  make  name  translation  automatic  and  efficient.1 

Providing  an  extensible  file  system 

T raditional  operating  systems  typically  rely  on  a  file 
system  to  manage  and  represent  persistent  storage. 
Flowever,  Legion's  global  name  space  and  persistent 
object  model  make  a  separate  file  system  unneces¬ 
sary— in  practice,  the  generalized  persistent  object 
spacedefined  by  Legion  serves  all  thepurposesof  con¬ 
ventional  filesystems.  In  Legion's  "filesystem,"  users 
see  familiar  elements  such  as  paths,  directories,  and 
universally  accessi ble files,  but  they  also  seearbitrary 
object  types  such  as  FI  osts,  ClassM  anagers,  and  appli¬ 
cation  tasks. 


Grow  without  limits 

The  system  must  be  able  to  add  new 
hosts  and  resources  over  time.  If  the  past 
has  shown  usanything,  it  isthatthenum- 
ber  of  interconnected  computational 
resources  will  only  increase.  Users  and 
organizations  do  not  want  arbitrary  lim¬ 
its  on  system  size  and  capacity.  System 
architectures  must  therefore  be  scalable 
and  conform  to  the  distributed  systems 
principle  that  "the  amount  of  service 
required  of  any  single  component  of  the 
system  must  not  grow  as  the  system 
grows."  If  an  architecture  does  not  con¬ 
form,  a  component  whose  load  (requests 
per  second,  for  example)  increases  as  the 
system  expands  will  at  some  point  become 
saturated,  and  performance  will  suffer. 

Tolerate  faults 

Several  years  ago  Leslie  Lamport 
quipped,  "A  distributed  system  is  one  in 
which  I  cannot  get  something  done  be¬ 
cause  a  machine  I've  never  heard  of  is 
down."  This  indictment  isdriven  by  a  si m- 
plefact:  Without  mechanisms  to  deal  with 


failure,  application  availability  is  the  prod¬ 
uct  of  component  availability.  In  today's 
business  climate,  an  unavailable  applica¬ 
tion  can  easily  cost  thousands  of  dollars 
per  minute.  A  wide-area  system  must 
therefore  be  resilient  to  failure  and  pro¬ 
vide  a  failure  and  recovery  model  and 
associated  services  to  applications  devel¬ 
opers,  so  they  can  write  robust  applica¬ 
tions.  The  model  must  include  notions  of 
fault  detection,  fault  propagation,  and  a 
set  of  useful  failure  mode  assumptions. 

Handle  multilanguage  and 
legacy  applications 

"I  don't  know  what  computer  language 
they'll  be  using  in  a  hundred  years,  but  it 
will  be  called  Fortran"  was  a  popular 
refrain  in  the  1980s.  H  undreds  of  millions 
of  lines  of  legacy  code  today  are  written  in 
languages  as  varied  as  Lisp,  RPG,  Cobol, 
assembler,  C/C++, Java,  and  (of  course)  For¬ 
tran.  Onething  iscertain:  Those  codes  will 
not  be  replaced  overnight,  and  we  will  still 
want  to  be  able  to  run  them  in  distributed 
environments.  Theimplication  isthat there 


must  bea  mechanism  for  supporting  legacy 
code  without  modification,  and  it  must  be 
able  to  support  a  variety  of  programming 
languages.  A  wide-area  computing  envi¬ 
ronment  must  be  language-neutral. 

Implement  flexible,  robust  security 

Security  includes  a  range  of  topics, 
including  authentication  (how  do  I  know 
who  you  are?),  access  control  (who  can  do 
what  to  each  resource?),  and  data  integrity 
(how  can  I  make  sure  no  one  can  read  or 
modify  my  data  in  memory,  on  disk,  or  on 
the  network?).  Each  of  these  issues  is  in 
the  Boeing/Pratt  &  Whitney  example. 
Clearly  we  must  be  able  to  provide  high 
levels  of  security,  but  there  is  more  to  the 
problem.  Security  can  be  costly  in  perfor¬ 
mance,  capability  restriction,  and  other 
dimensions.  M  oreover,  different  users  and 
organizations  want  to  enforce  very  differ¬ 
ent  policies.  The  challenge  is  to  provide 
each  user  and  organization  with  just  the 
right  mechanism  and  policy  rules  but  still 
to  allow  different  users  and  organizations 
to  interact. 
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Because  of  this  generality,  Legion's  object  space  is 
more  flexible  than  conventional  file  systems.  For 
example,  users  can  customize  individual  files  to  better 
suit  application-specific  behaviors  such  as  specialized 
file  access  patterns.  Consider  a  file  that  contains  a  two- 
dimensional  grid  of  data  items.  In  a  traditional  file 
interface,  accessing  a  singlegrid  row  or  column  might 
require  multiple  file  operations.  In  Legion,  users  can 
define  an  extended  file  type  to  represent  the  2D  file 
object,  with  additional  methods  to  permit  row  and 
column  access. 

Enabling  interprocess  communication 

At  the  lowest  level,  Legion  objects  communicate  via 
message  passing  to  transmit  method  parameters  and 
results.  H  owever,  applications  for  wide-area  systems 
need  tools  to  reduce  communication  and  to  tolerate 
high  latencies.  To  address  these  requirements,  Legion 
supports  macrodataflow,  a  variation  of  the  traditional 
remote  method  invocation  model. 

Like  other  asynchronous  remote  method  mecha¬ 
nisms,  macrodataflow  permits  multiple  concurrent 
invocations  and  lets  users  overlap  remote  methods  and 
local  computation.  However,  unlike  other  remote 
method  protocols,  macrodataflow  forwards  method 


results  directly  to  data-dependent  receivers.  For  exam¬ 
ple,  if  the  caller  does  not  directly  use  the  result  of  a 
remote  method,  but  needs  it  only  as  a  parameter  for 
future  invocations,  the  caller  will  never  receive  the 
result.  The  macrodataflow  protocol  avoids  the  unnec¬ 
essary  act  of  communicating  the  result  back  to  the 
caller,  and  instead  forwards  the  message  directly  to 
the  objects  where  it  is  needed. 

Legion  fully  automates  the  macrodataflow  protocol. 
Clients  can  specify  and  execute  program  graphs  of  inter¬ 
dependent  remote  method  invocations  using  macro¬ 
dataflow  library  interfaces,  or  via  Legion-aware  com¬ 
pilers  such  as  the  M  entat  Programming  Language 
Compiler.2  Similarly,  object  developers  need  not  be 
awareof  macrodataflow;  Legion  automatically  matches 
incoming  method  parameters  from  multiple  sources 
into  complete  method  invocations,  and  forwards  out¬ 
going  results  directly  to  data-dependent  recipients. 

Protecting  resources  and  applications 

Wide-area  operating  systems  must  protect  the  secu¬ 
rity  of  both  local  resource  providers  and  application 
users.  Resource  providers  require  that  the  wide-area 
operating  system  manage  local  resources  in  accor¬ 
dance  with  local  policies.  Application  programmers 


How  Legion  Differs  from . . . 


Common  Object  Request 
Broker  Architecture 

CORBA  3.0  defines  communication 
protocols,  naming  and  binding  mecha¬ 
nisms,  invocation  methods,  persistence, 
and  many  other  features  and  services 
essential  for  an  object-based  architecture.1 
As  such,  its  feature  set  and  Legion's  over¬ 
lap  in  many  areas. 

The  two  architectures  differ  in  their 
underlying  emphasis,  however.  CORBA 
w  as  i  n  i  ti  al  I  y  a  reacti  o  n  to  th  e  softw  are  i  nte- 
gration  problem.  Differences  between  soft¬ 
ware  components  in  location,  vendor, 
implementation  language,  or  execution 
platform  made  building  integrated  applica¬ 
tions  difficult  if  not  impossible.  CORBA 
developers  focused  on  enabling  interoper¬ 
ability,  and  thearchitectureprovidesa  com¬ 
mon,  object-based  playing  field  where 
components  can  communicateand  interact. 

In  contrast,  Legion  began  with  funda¬ 
mental  computing  resources  on  a  wide- 
area  network— CPU,  disk,  data,  and  so 
on— and  built  an  overarching  framework 
forthem.  Itemphasizestheability to  man¬ 


age  and  reason  about  resources.  The  goal 
was  to  reconstruct  a  coherent  computing 
environment  with  core  operating  system 
capabilities  over  a  complex,  heterogeneous 
environment.  Thus,  Legion  can  be  used 
simply  for  its  high-level  operating  system 
services  to  run,  schedule,  and  manage 
legacy  applications  in  a  network,  but  it  can 
mimic  theCORBA  standard  for  integrat¬ 
ing  applications.  These  two  aspects  com¬ 
bined  give  Legion  its  real  power. 

AsCORBA  evolves,  some  operating  sys¬ 
tem-type  services  are  starti  ng  to  be  defi ned 
for  it.  Scalability  and  other  wide-area  con¬ 
cerns  are  becoming  more  important.  It 
remains  to  be  seen  how  well  its  architec¬ 
ture  will  accommodate  these  changes. 

Globe 

The  Globe  project2  at  Vrije  University 
also  shares  many  goals  and  attributes  with 
Legion.  Both  occupy  middleware  roles 
(running  on  top  of  existing  host  operating 
systems  and  networks),  both  support 
implementation  flexibility,  both  havea  sin¬ 
gleuniform  object  model  and  architecture, 
and  both  use  objects  to  abstract  imple¬ 
mentation  details.  However,  the  object 


models  of  the  two  systems  differ  in  many 
respects.  Globe  objects  are  passive  and  are 
physically  distributed  over  potentially 
many  resources,  whereas  Legion  objects 
are  active,  independent  entities.  Because  of 
this  difference,  Legion  provides  a  more 
unified  view  of  system  components. 
Whereas  in  Globe  there  is  a  dichotomy 
between  objects  and  processes,  in  Legion, 
objects  are  themselves  the  units  of  compu¬ 
tation,  providing  the  basis  for  distribution, 
scheduling,  and  resource  management. 

Globe  and  Legion  both  provide  a  plat¬ 
form  for  constructing  applications  based 
on  interoperable  components.  But  Legion 
differs  significantly  in  also  providing  an 
integrated  infrastructure  for  resource  man¬ 
agement.  This  hallmark  of  a  wide-area 
operating  system  isessential  for  large-scale 
resource  sharing. 

Globus 

T  he  G  lobus  project3  at  A  rgonne  N  ational 
Laboratory  and  the  U  niversity  of  Southern 
C  al  ifornia  has  the  same  base  of  target  envi¬ 
ronments,  technical  objectives,  and  target 
end  users  as  Legion,  and  shares  some  of  its 
design  features.  However,  Globus  and 
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must  satisfy  the  security  requirements  of  their  appli¬ 
cations. 

Legion's  security  mechanisms  are  an  integral  part 
of  its  object  architecture.  The  basic  Legion  security 
service  is  user-selectable  data  privacy  and  integrity 
within  the  Legion  message-passing  layer.  Legion  lets 
messages  be  fully  encrypted  for  privacy,  digested  and 
signed  for  integrity  checking,  or  sent  in  theclear  if  low 
performance  overhead  is  an  application  priority. 
Cryptographic  services  in  Legion  are  based  on  theRSA 
public  key  system  (http://www.rsa.com).  To  protect 
against  certain  kinds  of  public  key  tampering,  objects 
encode  their  RSA  public  keys  directly  into  their 
LOIDs.  Simply  by  knowing  an  object's  LO  ID,  a  client 
can  communicate  securely  with  that  object. 

In  any  operating  system,  access  control  and  resource 
protection  are  central  issues.  In  Legion,  all  resources 
are  represented  by  objects,  so  access  control  and 
resource  protection  are  specified  entirely  at  theobject 
level.  Invoked  objects  autonomously  enforce  access 
control  invocation  by  invocation,  using  a  mandatory 
internal  method  called  M  ayl.  When  a  method  invo¬ 
cation  arrives  at  an  object,  it  is  first  processed  by  the 
object's  M  ayl  method,  which  can  enforcean  arbitrary 
access  control  policy.  Typically,  M  ayl  makes  access 


control  decisions  on  the  basis  of  credentials  passed 
along  with  method  parameters.  Credentials  consist  of 
afree-form  set  of  rights  signed  by  a  responsibleclient. 
The  default  M  ayl  implementation  is  based  on  user- 
configurable  access  control  lists,  including  the  notion 
of  groups. 

In  addition  to  access  control  mechanisms,  operating 
systems  must  define  mechanisms  for  user  identity  and 
authentication.  Users  (like  all  other  Legion  entities) 
are  represented  by  objects,  which  are  assigned  unique 
LOIDs.  The  user's  LOID  contains  his  public  key,  but 
the  user  keeps  his  private  key  safe  through  arbitrary 
local  means,  such  as  a  smart  card.  Trusted  Legion  pro¬ 
grams  executed  by  the  user  (the  Legion  login  shell,  for 
example)  rely  on  the  user's  private  key  to  sign  appro¬ 
priate  credentials  for  outgoing  methods.  These  cre¬ 
dentials  form  the  basis  for  authenticating  the  user  and 
aretypically  used  in  conjunction  with  per-object  access 
control  lists  to  enforce  user  access  control. 

APPLICATIONS  OF  LEGION 

Legion's  services  can  accommodate  a  variety  of 
domains  and  platforms.  Two  current  applications 
illustrate  its  flexibility  in  supporting  distributed  enter¬ 
prise  computing. 


Legion  have  fundamentally  different  high- 
level  objectives.  Globus  provides  a  basic  set 
of  services  that  lets  users  write  applications 
for  a  wide-area  environment.  Working 
components  become  part  of  a  composite 
distributed  computing  toolkit.  Legion,  in 
contrast,  strives  to  reduce  complexity  and 
to  provide  the  programmer  with  a  single 
view  of  the  underlying  resources,  so  it 
builds  higher  level  system  functionality  on 
top  of  a  single  unified  object  model. 

TheGlobusapproach  has  several  strong 
points.  0  ne  is  that  it  takes  great  advantage 
of  codereuseand  buildson  user  knowledge 
of  familiar  tools  and  work  environments. 
Thisapproach  also  has  several  drawbacks. 
As  the  number  of  services  grows,  the  lack 
of  a  common  programming  interface  and 
model  becomes  a  significant  burden.  By 
providing  a  common  object  programming 
model  for  all  services,  Legion  permits  users 
and  tool  builders  to  combinethe  many  ser¬ 
vices  available  in  the  wide-area  operating 
system:  schedulers,  I/O  services,  applica¬ 
tion  components,  and  so  on.  For  example, 
users  can  run  the  same  access  control  tools 
to  configure  security  for  files  and  for  hosts. 
We  believe  the  long-term  advantages  of 


basing  a  system  on  a  cohesive,  compre¬ 
hensive,  and  extensible  design  outweigh  the 
short-term  advantages  of  evolutionary 
composition  of  existing  services. 

The  Web 

The  Web  is  not  a  single  entity  whose 
characteristics  can  be  isolated  and  ana¬ 
lyzed.  Rather,  it  is  a  broad  collection  of 
applications,  protocols,  and  libraries 
focused  on  content  delivery  to  end  users 
running  browsers.  Advances  in  Web 
browser  interfaces  and  functionality  have 
driven  the  Web  revolution,  transforming 
it  from  an  elitist  tool  to  an  omnipresent 
phenomenon.  Given  that  the  Web  is  most 
users'  primary  experience  with  distributed 
computing,  it  is  important  to  define  its  role 
in  wide-area  computing. 

TheWeb  in  its  currentform  clearly  does 
not  constitute  a  wide-area  operating  sys¬ 
tem.  Basic  operating  system  issues,  such  as 
resource  management  and  task  scheduling, 
are  simply  not  part  of  the  Web's  structure. 
Thisisnotan  indictment  of  theWeb,  buta 
recognition  of  its  true  strength  asa  remote 
access  medium  for  distributed  content  and 
a  ubiquitous  interface  technology  for  ac¬ 


cessing  distributed  applications.  As  such, 
the  Web  is  the  perfect  front  end,  or  inter¬ 
face,  to  applications  running  in  wide-area 
operating  systems  such  as  Legion.  Ap¬ 
plication  interfaces  can  be  written  in  Java 
or  they  may  useHTM  L  and  the  Common 
Gateway  Interface  (CGI).  They  can  com¬ 
municate  with  back-end  applications  using 
either  native  socket  protocols,  HTTP,  or 
higher  level  interfaces  provided  by  the 
wide-area  operating  system.  Viewed  this 
way,  theWeb  and  wide-area  operating  sys¬ 
tems  such  as  Legion  are  complementary. 
For  many  users,  theWeb  provides  the  most 
natural  window  into  the  Legion  universe. 
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Figure  2.  TheMRI 
data  collection  sys¬ 
tem  in  development 
for  Harvard  Medical 
School.  The  compo¬ 
nents  of  the  MRI  data 
collection  applica¬ 
tion  run  on  central 
servers  at  Harvard 
and  on  front-end 
computers  at  the  MRI 
centers. 


MRI  data  collection 

TheM  Rl  data  collection  system  in  development  for 
H  arvard  M  edical  School  (see first  example  in  theside- 
bar 11 C  hal lenges  for  a  W  ide-A  rea  0  perating  System” ) 
is  a  good  illustration  of  an  application  structure  that 
fits  wel  I  w  ith  L  egion's  services.  T  he  components  of  the 
M  Rl  data  collection  application  run  on  central  servers 
at  H  arvard  and  on  front-end  computers  at  the  M  Rl 
centers.  Figure  2  shows  the  architecture.  Each  leaf 
node  has  an  M  Rl  collection  object  (blue)  that  scans 
thelocal  disk  for  specially  tagged  M  Rl  images  that  the 
scanner  hasdumped.TheM  Rl  collection  object  copies 
these  images  into  its  persistent  data  space  so  that  they 
will  not  bedel eted  when  thescanner's "dumping  direc¬ 
tory”  is  automatically  cleaned  up.  Periodically  the 
M  Rl  collection  object  callstheimageprocessing object 
at  H  arvard  (red)  to  upload  thedata  in  encrypted  form, 
authenticating  itself  by  including  appropriately  signed 
certificates  in  the  method  invocations.  When  it  receives 
a  complete  batch  of  scans,  the  image  processing  object 
starts  an  image-processing  pipeline,  which  consists  of 
objects  automatically  scheduled  onto  local  compute 
servers.  The  final  stage  of  the  processing  pipeline 
inserts  the  results  in  the  project's  image  database. 

W  hen  a  leaf  node  is  rebooted,  the  node's  H  ost  object 
(yellow)  starts  automatically  and  registers  with  its 
manager  (green)  in  the  larger  Legion  net.  The  Class 
M  anager  object  (pink)  for  the  M  Rl  collection  com¬ 
ponent  detects,  via  polling  of  the  green  H  ost  object 
Class  M  anager,  that  the  node  is  up  and  requests  a 
restart  of  the  blue  M  Rl  collection  object  for  that  node. 
The  yellow  Host  object  on  the  node  handles  the 
request,  detecting  simultaneously  if  the  M  Rl  collec¬ 
tion  object  has  been  upgraded  and,  if  so,  download- 
ingthenew  executableautomatically.  Asit comesup, 
theM  Rl  collection  object  recovers  its  state,  which  may 
include as-yet-untransmitted  M  Rl  scans. 

Both  the  H  ost  object  and  M  Rl  collection  object 
ClassM  anagers  have  replicated  persistent  state.  If  the 


ClassM  anager  goes  down,  its  own  higher-order  Class 
M  anager  will  detect  the  loss  and  restart  it  using  the 
replica.  This  detection  and  restart  behavior  recurses 
up  a  tree  of  metamanagers  (typically  only  one  or  two 
levels)  to  the  root  Legion  manager  object,  which  has 
a  hot  spare. 

TheClass  M  anager,  H  ost,  and  other  objects  in  the 
system  are  all  configured  with  strict  access  control. 
Calls  to  various  objects  must  present  credentials  to 
gain  authorization.  The  M  Rl  collection  application 
and  its  Legion  infrastructure  are  owned  and  accessi¬ 
ble  only  by  a  small  set  of  Legion  users  at  H  arvard. 
These  users  can  centrally  monitor  and  configure  the 
system  using  Legion  tools  that  provide  views  of  all  the 
hosts,  objects,  and  so  forth  that  are  running  or  down. 

Climate  modeling 

Climate  modeling  has  progressed  beyond  basic 
atmospheric  simulations  to  include  multiple  aspects 
of  the  earth  system,  such  as  full-depth  ocean  models, 
high-resolution  land-surface  models,  sea  ice  models, 
and  chemistry  models.  Typically,  these  models  come 
from  different  research  groups  at  a  variety  of  institu¬ 
tions,  are  written  in  different  languages,  and  require 
different  resources.  As  described  in  thefourth  exam¬ 
ple  in  the  sidebar  "Challenges  for  a  Wide-Area 
Operating  System,"  coupled  applications  composed 
from  existing  models  require theability  to  coordinate 
existing  components  and  to  manage  combined 
resources. 

Legion's  ability  to  combine  and  add  value  to  exist¬ 
ing  components  to  create  more  complex  applications 
fits  nicely  with  thisapplication.  To  construct  the  cou¬ 
pled  climatemodel  system,  developers  use theexisting 
simulations  as  implementations  for  two  new  Legion 
object  types:  Global  M  odel  and  M  esoscaleM  odel.  In 
doing  so,  they  modify  thesimulationsto  enable  link¬ 
age  to  a  Legion  object  interface  (described  in  IDL), 
and  modify  the  I/O  calls  in  the  models  to  use  Legion 
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file  objects  in  place  of  the  local  filesystem.  Each  new 
model  object  supports  a  method  to  request  the  exe¬ 
cution  of  a  simulation  timeinterval.  Coordination  and 
coupling  of  the  model  objects  is  accomplished  through 
the  use  of  a  Legion  Coupler  object.  This  object  also 
transforms  data  from  each  model  into  the  format 
required  by  the  other  (for  example,  the  models  employ 
geographic  grids  that  differ  by  an  order  of  magnitude 
in  resolution). 

Legion  also  satisfies thisapplication's requirements 
for  managing  resources.  For  example,  application 
developers  can  configure  the  C  lass  M  anager  for  the 
Global  M  odel  object  to  know  that  a  Cray  T3E  is 
required  for  this  object  type.  When  the  model  becomes 
available  on  the  IBM  SP,  they  can  reconfigure  the  Class 
M  anager  with  a  single  command  to  account  for  this 
new  resource  selection  possibility.  When  a  user  wants 
to  run  the  complete  coupled  simulation,  a  standard 
Legion  component— the  Scheduler  object— coordi¬ 
nates  the  acquisition  of  all  needed  resources  (such  as 
aT3E  or  SP  to  run  theglobal  model,  a T 90  to  run  the 
mesoscale  model,  and  a  workstation  to  host  the 
C  o  u  p  I  er  o  bj  ect ) .  R  ega  r  d  I  ess  of  t  h  e  r  eso  u  r  ces  sel  ected , 
Legion  automatically  takes  care  of  installing  the 
needed  application  components  at  the  target  sites,  and 
it  uses  the  appropriate  interfaces  for  the  local  site's 
task  and  storage  allocation. 


We  are  continuing  to  develop  higher  level  ser¬ 
vices  in  Legion  as  we  acquire  more  infor¬ 
mation  from  applications.  For  example, 
broad  classes  of  applications  can  profit  from  similar 
fault-response  techniques.  To  address  this  need,  we 
are  designing  drop-in  fault  tolerance  modules  based 
on  theexisting  detection  and  reporting  infrastructure. 
Wealso  plan  to  develop  new  application  tools,  such  as 
an  integrated  Legion  debugger,  and  to  port  applica¬ 
tion  toolkits  such  as  N  etsolve  (http://www.cs.utk.edu/ 
netsolve).  These  efforts  are  guided  by  our  close  col¬ 
laborations  with  an  expanding  set  of  applications 
groups,  such  as  the  H  arvard  M  edical  School  and  the 
climate  modeling  groups  mentioned  earlier.  Finally, 
we  are  actively  engaged  in  commercializing  the  Legion 
platform  for  use  in  Internet  and  enterprise  settings. 
For  more  information,  visit  the  Legion  site  (http:// 
legion.virginia.edu).  ❖ 
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